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Rice is the primary staple food source for Indonesian people, with 
consumption increasing so that rice production needs to be increased. 
Rice drought is one of the problems that can hamper rice production. 
This research aims to determine the best extraction feature between the 
normalized difference vegetation index (NDVI) and the normalized 
difference water index (NDWI) in describing rice fields’ dryness. Moreover, 
using the random forest regression algorithm. This research compares NDVI 
with NDWI using data originating from Sentinel-2A and retrieved via the 
google earth engine. Regression algorithms are used in research to predict 
drought in paddy fields. This research shows that NDVI is better than NDWI 
in predicting drought using random forest regression algorithms and logistic 
regression algorithms. The random forest regression algorithm based on the 
results obtained shows that the average root mean square error (RMSE) on 
NDVI is 0.018, and NDWI is 0.012. Based on the logistic regression 


algorithm results, it was found that the average value of RMSE on NDVI 
was 0.346, and NDWI was 0.336. Based on the results of the RMSE, 
it shows that the forecasting ability of the random forest regression 
algorithm is better than the logistic regression. 
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1. INTRODUCTION 

In Indonesia, rice drought can occur every year due to El Nino and could significantly impact the 
agricultural sector in several Indonesian regions [1]. To measure farmers’ and communities’ resilience is 
facing drought and identify the factors influencing it to summarize the policy implications with the various 
indicators produced. It is also obtained from the application of livelihoods in identifying determining factors. 
Strengthening farmers’ resilience to drought can be strengthened by the ease of credit, easy equipment rental, 
and technical efficiency in rice production [2]. Drought can significantly impact crop yields when production 
is reduced, leading to price increases to consumers [3]. It also increases production costs which can have an 
impact on the economic sector [4]. Drought on agricultural land can significantly impact the economy, 
politics, and technology, especially in high severity that creates enormous losses [5]. In the rice-growing 
season period, an adequate irrigation system is required, but drought can occur at any time. Climate change is 
currently impacting different rainfall patterns every year, even in different regions [6]. 

In this research, a comparison of the moisture content in the vegetative and generative phases was 
carried out to be predicted in the ripening phase because the water content in each phase was different and 
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greatly affected rice growth and ultimately affected grain production. In the vegetative phase, growth has 
active tillers, a gradual increase in plant height, and leaves begin to increase periodically. Extension stems 
characterize the reproductive phase, decreasing the number of tillers, booting, the appearance of flag leaves, 
crowns, and flowering. In the reproduction phase, on average, it can be estimated at 30 days in most 
cultivars. The initial phase extends the internodes or the grafting stage and varies slightly according to 
cultivar and weather conditions. During this period, grains’ size and weight will increase from starch and 
sugar sources released from the sheath of leaves and stems. The grain turned to gold, and the rice leaves 
began to age [7]. Drought conditions can decrease the quality of grain yields per clump, especially in 
chlorophyll content, the ratio of chlorophyll a/b, and increased proline and total sugar accumulation [8]. 

Remote sensing is the observation of an object using a device remotely [9]. Sentinel-2 consists of 13 
spectral bands and has an orbital map with a width of 290 km. Each of the Sentinel-2 constellation satellites 
has a repeating cycle of 10 days, and with both satellites fully operational, a 5-day resolution can be achieved 
at the equator [10]. According to literature research using the Sentinel-2A vegetation value index, five classes 
were taken from these three main periods: land preparation, early vegetative, late vegetative, generative, and 
harvest/ripening [11]. Thus, to classify the cover of rice fields can use Sentinel-2 imagery [12]. Moreover, it 
supported using the google earth engine for image processing without using clouds. The imagery used comes 
from the Sentinel-2A satellite because it is more accessible using an earth engine than Google, as it costs 
nothing and rotates around the earth for ten days [13] so that monitoring is done faster. 

The threat of drought has hit some areas in Kebumen Regency, farmers in collaboration with the 
district government, farmers have started looking for various alternative water sources that will eventually 
flow using a pumping system. So, it takes a systematic and efficient effort that will have a risk of loss due to 
these threats by using the normalized difference water index (NDWT) and normalized difference vegetation 
index (NDVI). NDVI results can describe results with specific cloud-free images that are not available when 
relying on sensors with a high spatial resolution for a certain period. NDWI is remote sensing, sensitive to 
water content changes [13]. Near-infrared (NIR) and short wave infrared (SWIR) combinations eliminate 
variations caused by the leaves’ inner structure and the leaves’ dry matter content, increasing vegetation 
moisture uptake accuracy [14]. 

In 1995, Tin Kam Ho proposed the random forest (RF) with his research entitled random decision 
forest [15], then in 2001, it was redeveloped by Leo Breiman, which was then patented [16]. Random forest 
regression algorithm is an ensemble learning that combines most regression trees. The regression tree can be 
represented by collecting hierarchically arranged conditions continuously from the root to the tree leaves [17]. 
Logistic regression algorithm (LR) is mathematical modeling with an approach that can describe several 
variables’ relationships. So far, the logistic regression algorithm is the most widely used modeling procedure 
for epidemiological data analysis [18]. As a result, the random forest algorithm consists of trees that have 
been planted with user values. The result will be obtained from the average error in the numerical predictor 
results. The random forest predictor is formed by taking the generalization errors over k trees [19], while the 
logistic regression algorithm describes the relationship of multiple Xs to a dichotomous dependent variable [18]. 
This research contribution compares the extraction features of NDVI and NDWI and compares random forest 
regression and logistic regression to predict drought in the ripening phase using the Sentinel-2 satellite. 


2. PREVIOUS RESEARCH 

Some of the research results can be described as follows: the application of the NDVI method using 
remote sensing in determining the density of vegetation is widely used as research material. This study aims 
to explain the phenology of rice using Sentinel 2-A imagery with the NDVI to determine the beginning and 
end of the rice planting period, making it easier to monitor rice field conditions to improve plant size 
predictions in a short time [12]. Another study that uses the NDVI method aims to estimate rice productivity 
based on NDVI wave characteristics and regression from NDVI and rice productivity [20]. Subsequent 
research aims to: i) develop a phenology-based Landsat develop a Landsat scheme based on phenology to 
identify paddy fields during two phenological phases (flooding/transplantation and ripening) at a regional scale; 
and ii) systematically evaluate the accuracy and resultant uncertainty of the Landsat-based rice field map [21]. 

Using Landsat 8, NDVI aims to map various irrigated crops, highly fragmented, small in size, and 
heterogeneous agricultural landscapes [22]. The NDVI method is also used to use the Landsat 8 time series 
variogram, namely operational land imager (OLI), NDVI, NIR, and red images, to model agricultural land’s 
spatial heterogeneity at various stages of growth [23]. From related research, five research use the NDWI 
extraction feature. The first research used NDWI to monitor drought [24]. Another research used NDVI for 
mapping vegetation moisture content [25]. NDWI is also used for detecting changes in surface water [26]. 
Another NDWI research was used to evaluate vegetation cover types [27]. The NDWI method is also used 
for monitoring drought in vegetation [28]. Furthermore, the contribution of this research also uses the random 
forest regression algorithm to predict drought inland. NDVI and NDWI need to be compared to this 
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extraction feature to detect drought. This research predicts the moisture content of experiencing drought 
using a random forest regression [29]. 


3. RESEARCH METHOD 
3.1. Research stages 

We are estimating the productivity of the approach used to answer the research objectives. Figure 1 
explains the identification of rice fields in Kebumen, Central Java. Furthermore, by using data collection 
from Sentinel-2A data, pre-processing was carried out starting with atmospheric correction followed by 
Sentinel 2 reflectance, followed by a sampling strategy on the rice field at zoning size 160x154 for sample 
acquisition. The following process is feature extraction using NDVI and NDWI to find out water indication 
and drought comparison. Then, the modeling process continues the results of feature extraction using random 
forest regression and logistic regression. Finally, the evaluation model uses root mean square error (RMSE) 
and out of bag (OOB) to see the level of accuracy, which results in prediction comparison. So that, in the end, 
it can show a comparison analysis. 
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Figure 1. Research stages 
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3.2. Research area and data collection 

The rice field area’s research location is located in the Kebumen Regency, Central Java Province, 
the largest rice-producing area with 2174 hectares. Geographically this location or area of interest (AOT) is at 
coordinates 109.699456004, 109.745133512, -7.772345033, -7.728641145 [EPSG: 4326]. This research data 
and information were conducted with Sentinel 2 imagery from 1 March 2020 to 15 September 2020. From that 
period used imagery data with areas without cloud cover so that the land can be seen. In 1 period, obtained 
imagery is processed through the pre-processing stage that clips on land used for research sites. The data did 
not use in June 2020 and August 2020 because the research object is 80% covered by clouds, the research 
using an image that at least has a cloud tolerance of up to 10%. 

Figure 2 explains the research location in Kebumen, Central Java. The characteristics of the area of 
Kebumen Regency can be distinguished into alluvial soil, latosol soil, podsolic soil, regosol soil, gray glei 
humus, and alluvial associations and the litosol and brown mediterranean associations, where the potential of 
the land can show that some of the areas are classified as fertile enough to be used as agricultural land. 
However, several sub-districts such as Sempor, Karanganyam, Sadang and Alian have soil characteristics that 
are less capable of being used as agricultural land [30]. 





Figure 2. Research area in Kebumen 
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3.3. Pre-processing 

The pre-processing stage is the stage where data preparation is carried out before the data is 
processed. A raster data usually has an extensive area coverage not to reflect the area to be researched. It can 
describe the research area. It is essential to cut data or what is commonly known as clipping. A raster data 
usually has an extensive area coverage not to reflect the area to be researched. It can describe the research 
area. It is essential to cut data or what is commonly known as clipping [31]. The area of interest defines the 
clipped region, which can then be defined by points or shapes based on the coordinates. The shape of the 
defined area will follow the clipping procedure. The steps are carried out using the google earth engine in 
atmospheric correction, followed by Sentinel 2 reflectance so that a sampling strategy is obtained in the 
fields. After cutting, the array size at one became smaller, namely 160x54 for sample acquisition. The image 
is taken from medium satellite imagery because the land used at the research site contained thin clouds to see 
better results and minimize de-noising from an imbalance dataset. It increases accuracy by reducing errors, 
especially for predictive models. One challenge is developing a general auto exposure solution that includes a 
wide range of imaging sensors [32] with a camera’s fast and powerful auto-exposure algorithm [33]. 


3.4. Extraction feature 
3.4.1. NDVI 

NDVI is a vegetation measurement that helps find vegetation density and see the level of plant 
health. NDVI is also used to measure the greenness of vegetation. NDVI is sensitive to photosynthetic 
activity by chlorophyll so the NDVI value can be used to make vegetation classifications. NDVI results are 
obtained from the ratio of red (RED) and NIR [34]: 


(Band 8A — Band 4) 
(Band 8A + Band 4) 


NDVI = (1) 


The (1) describes the NDVI calculated from bands 4 RED and 8 (NIR, resolution 10-m) or 8A 
(NIR, resolution 20-m) obtained from Sentinel-2A [35]. NDVI is also commonly used in drought monitoring, 
agricultural production forecasting, and fire-prone zone forecasts, as well as maps of desert attacks all over 
the world. The amount of historical data available can affect the forecasting results [36]. Since it is easier to 
adjust for changes in lighting conditions, surface slope, exposure, and other external factors, NDVI is 
becoming more commonly used in global vegetation monitoring. 


3.4.2. NDWI 

The NDWI method, which combines NIR and SWIR, is used to determine the water’s condition. 
NDWI is used to determine water status by combining NIR and SWIR because both are located on a high 
reflectance and have a profound depth in the vegetation canopy [37]. NDWI can effectively improve water 
information in most cases. The (2) describes the RED band as band 4, the NIR band is band 8A, and the 
SWIR band is band 11 on Sentinel-2A [37]. 


(Band 8A — Band 11) 
(Band 8A + Band 11) 


NDWI = (2) 


3.5. Modeling and evaluation prediction 

The random forest regression algorithm combines many regression trees into an ensemble learning 
algorithm. A regression tree is a set of boundaries or conditions arranged hierarchically to be extended 
sequentially from tree roots to leaves [38]—[40] The random forest is a solution to solve this problem. 
The random forest method is one of the methods in the decision tree. A decision tree is a flowchart shaped 
like a tree with a root node used to collect data, an inner node located on the root node containing questions 
about data, and a leaf node used to solve problems and make decisions. Which consists of various decision 
trees with (3) [41]. 


{h(x, 0x), t = 1,2,3, ...N} (3) 


The (3) explains that 0x is a random variable distributed independently, x is the input variable, and N 
is the total of regression decision trees. The probability of generating a random forest is determined during the 
process extracted moment. The estimate of the total N of the unselected sample is referred to as the out-of-bag 
(OOB) result [41]. For regression, random forest constructs several K of regression trees and averages the 
results. After the K like a tree grows, the predictor of random forest regression is explained by the (4) [40]. 


fig) = DET) (4) 
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The (4) explains that x is the input variable, T is the tree value (1,2,3,...N), and K is the total 
number of trees in the random forest (the size of the random forest) [40]. Furthermore, the previous stage’s 
performance evaluation of the prediction results used the RMSE model to calculate the prediction error [42]. 
The RMSE has been used as a primary statistical metric to calculate model efficiency in meteorology, air 
quality, and climate science. Although both have been used to evaluate model efficiency for many people 
over the years, there is no agreement about model errors’ most suitable metrics. 

To make it easier, we will say we already have n sample model errors, counting e as 
(e (i,) i =1,2,5...,n). Uncertainties resulting from observation errors or the methods used to compare 
models and observations are not considered in this research [43]. OOB is data that is not used to develop 
trees and represents data outside the sample used for cross-validation purposes. It will be easier to determine 
an indicator that indicates if the case is in the bag or OOB [44]. 

In this research used logistic regression (LR) algorithm is a derivative of the natural algorithm as a 
regression function of the predictors compared with random forest. Logistic regression is an approach to 
making predictive models such as linear regression, commonly referred to as ordinary least squares (OLS) 
regression. The difference is that researchers predict bound variables that scale dichotomy in logistic 
regression. With one predictor, X, this takes the form of equations [45]. 


Infodds(Y = 1)] = Bp + BX (5) 


The (5) explains that In stands for the natural algorithm, Y is the result, and Y = 1 when the event 
occurs (versus Y = 0 if it does not), Bo is the intercept term and p, represents the regression coefficient, 
change in the event probability algorithm with 1 unit change in predictor X [45]. If OLS requires the 
condition or assumption that residual errors are distributed normally. Conversely, in this regression there is 
no need for these assumptions because in this type of logistic regression follows the distribution of logistics. 
Whereas if the dependent variable used consists of more than two categories, then the right logsitic 
regression model is multinomial logistic regression. 


4. RESULT AND DISCUSSION 

Based on the visualization of NDVI shown in Figure 3 Result of visualization, the drought occurred 
in March 2020. According to the area of interest (AOI) related to drought in the location of Kebumen, 
Central Java. To make it easier to explain the results of preprocessing carried out with the NDVI and NDWI 
indexes, it is seen in Figure 3. 

Figure 3 shows preprocessing for March, April, May, July, and September 2020, describes 
preprocessing by clipping according to the research location. Figure 3 is divided into two figures, namely 
preprocessing NDVI in figure 3(a) and NDWI in figure 3(b). This figure uses band 4, band 8A, and band 11 
for the extraction feature and does not use cloud data to better value. 

Figure 3(a) shows using NDVI. NDVI is divided into six class categories: non-vegetation, lowest 
dense, lower dense, dense, higher densities, and highest dense. Vegetation has the potential to store biomass 
and carbon. So the presence of vegetation can show how much carbon and biomass stocks are [46]. Staining 
on NDVI has a sensitivity index value that tends to be less good for detecting water content. 

While Figure 3(b) shows using NDWI. NDWI uses the same categories as Figure 3(b) to obtain 
preprocessing results, which are compared in parallel to monitor the tested land. From the visualization, it can 
be seen the results of the comparison between NDVI and NDWI in Figure (4). 

The results of preprocessing the vegetation index used are based on the NDVI index with a range of 
0 to 1. This index describes the greenish level of a plant. The vegetation index is a mathematical combination 
of the red band and the NIR band as an indicator of the presence and condition of vegetation; in this case, the 
index range is used to determine the moisture content at the location being tested and then depicted with 
graphics to get the actual value in the results of data processing, seen in Figure 4. 

Figure 4 describes the comparison of the NDVI and NDWI vegetation index values. The higher the 
water content, the closer the extraction feature value approaches 1, and vice versa: the lower the water 
content, the closer the feature extraction value approaches 0. It seems that NDVI is better at predicting the 
level of dryness in rice fields. The results showed that NDVI did best in drought compared to NDWI. 
According to Table 1, NDVI is divided into six class categories: non-vegetation, lowest dense, lower dense, 
dense, higher densities, and highest dense [46]. 

After get the index value, it needs to evaluate based on statistic to monitoring the drought and show in 
Table 2. It shows the evaluation results using RMSE to evaluate the error comparison to detect the de-noising value 
in the dataset used, then use an RF (OOB) and LR to see the percentage of predictions. The scaling factor cannot 
change the value adaptively after training, but it can learn model patterns and averages in the training set [47]. 
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Share training data and testing data with a percentage of 80% and 20% to guide modeling to meet local 
optimal points better [48]. In NDVI, the average value of RF (OOB) is 0.988 with RMSE 0.018, while the 
average value is LR 0.952 with RMSE 0.346. In NDWI, the average value of RF (OOB) is 0.99 with RMSE 
0.012, while the average value is LR 0.946 with RMSE 0.336. Based on these data results, 
the prediction evaluation results on NDVI are better than NDWI. From the results of the vegetation index and 
the algorithm that has been made, it can be seen that NDVI is better than high vegetation levels with blue 
coloring. Furthermore, the algorithm’s results indicate that the RF and LR algorithms’ average values will be 
higher with a high index. The RMSE value for NDVI is 0.018, indicating that NDVI is better in terms of 
evaluation than NDWI, which has an RMSE value of 0.012. 
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Figure 3. Preprocessing for March, April, May, July, and September 2020: (a) NDVI and (b) NDWI 
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Figure 4. Comparison vegetation index value 


In Figure 4, NDVI from March 2020 to September 2020 experienced a decrease in the vegetation index level, 
while NDWI from March 2020 to September 2020 also decreased, only experiencing a slight increase in July 
2020. Furthermore, to clarify the level of vegetation is explained in Table 1. 


Table 1. The index value of vegetation 
No. Dense class NDVI Hex code 
Non vegetation <0 Hf ffftt 
Lowest dense 0-0. 15 #d1e3f3 
Lower dense 0.15-0.3  #9ac8el 
Dense 0.3-0.45  #529dcc 
Higher dense 0.45-0.6 #lc6cb1 
Highest dense > 0.6 #08306b 
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Table 2. Evaluation prediction of NDVI and NDWI 








Months NDVI NDWI 

RMSE _RF(OOB) RMSE LR RMSE RE(OOB) RMSE LR 
March 0.02 0.99 0.35 0.96 0.01 0.98 0.39 0.96 
April 0.01 0.99 0.39 0.93 0.01 0.99 0.43 0.92 
May 0.01 0.98 0.30 0.92 0.01 0.99 0.13 0.95 
July 0.02 0.99 0.30 0.95 0.02 0.99 0.43 0.94 
September 0.03 0.99 0.39 0.95 0.01 1 0.30 0.96 
Average 0.018 0.988 0.346 0.952 0.012 0.99 0.336 0.946 





5. CONCLUSION 

In this research, it can be concluded that the NDVI extraction feature is better than the NDWI 
extraction feature in predicting drought. Drought prediction is carried out by implementing the feature 
extraction value on the Sentinel-2 satellite image data. The data that has been feature extracted is then 
processed using the random forest regression algorithm and logistic regression algorithm to predict the 
drought of rice fields. Furthermore, the data was tested using RMSE, RF(OOB), and LR accuracy. 
The results obtained by NDVI have an average RF value (OOB) of 0.988 with an RMSE of 0.018, while the 
average value of LR is 0.952 with an RMSE of 0.346, while the NDWI average value of RF (OOB) is 0.99 
with an RMSE of 0.012, while the average value of LR is 0.99, 0.946 with RMSE 0.336. Based on these data 
results, the evaluation of NDVI is better than NDWI. For further research, it is necessary to compare with 
other extraction features such as enhanced vegetation index (EVI), NDMI, soil adjusted vegetation index 
(SAVI), and other extraction features that are related to the level of the greenness of vegetation and to 
strengthen the prediction results, and further prediction evaluation is needed, using explained variance score 
(EVS), R squared (R?), mean squared error (MSE), and mean absolute error (MAE). 
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